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Abstract 

For a Harris ergodic Markov chain (X n )„>o, on a general state space, started from the 
so called small measure or from the stationary distribution we provide optimal estimates for 
Orlicz norms of sums Y^i=o /P^)> wnere T is the first regeneration time of the chain. The 
estimates are expressed in terms of other Orlicz norms of the function / (wrt the stationary 
distribution) and the regeneration time r (wrt the small measure) . We provide applications 
to tail estimates for additive functionals of the chain (X n ) generated by unbounded functions 
as well as to classical limit theorems (CLT, LIL, Berry-Esseen). 
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1 Introduction and notation 

Consider a Polish space X with the Borel cr-field B and let (X n ) n >Q be a time homogeneous 
Markov chain on X with a transition function P: X x B —> [0,1]. Throughout the article we 
will assume that the chain is Harris ergodic, i.e. that there exists a unique probability measure 
7r on (X,B) such that 

\\P n {x, ■) - tt\\tv -> 

for all x £ X, where || • \\tv denotes the total variation norm, i.e. [|^[|rv = su PyieB 1/^(^)1 f° r 
any signed measure /i. 

One of the best known and most efficient tools of studying such chains is the so called regen- 
eration technique [28|. . which we briefly recall bellow. We refer the reader to the monographs 
[26] and [12] for extensive description of this method and restrict ourselves to the basics 
which we will need to formulate and prove our results. 

Below we assume that the chain is Harris ergodic. 

One can show that under the above assumptions there exists a set (usually called small set) 
C £ £ + = {A G B: tt(A) > 0}, a positive integer m, 5 > and a Borell probability measure v 
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on X (small measure) such that 

P m {xr)>5v{-) (1) 

for all x G C. Moreover one can always choose m and v in such a way that v[C) > 0. 

Existence of the above objects allows for redefining the chain (possibly on an enlarged prob- 
ability space) together with auxiliary regeneration structure. More precisely, one defines the 
sequence (X n ) n >o and a sequence (Y n ) n >o by requiring that Xq have the same distribution as 
Xq and specifying the conditional probabilities 

lP(ifc = l,X km+ i G dx\, . . . ,^"(fc+i) m -i G daj m _i,X( fe+1 ) m G dy| F km , J r k _ 1 , X km = x) 
= P(Yk = l,X km+1 G dxi, . . . ,X( fc+1 ) m _! G dx m -i,X( k+ i) m G dy|X = x) 

= 1{xeC} P^^dy) dXl) ' ' ' p ( g ™-i> d S/)' 

where J 7 ^ = cr^i^)^™) and J 7 ^l 1 = £7-((li),<fe_i). 

One can easily check that (X n ) has the same distribution as (X n ) and so we may and will 
identify the two sequences (we will suppress the tilde). The auxiliary variables Y n can be used 
to introduce some independence which allows to recover many results for Markov chains from 
corresponding statements for the independent (or one-dependent) case. Indeed, observe that if 
we define the stopping times 

r(0) = M{k >0,Y k = 1}, r(i) = inf{£; > r(* - 1) : Y k = 1}, i = 1, 2, . . . , 

then the blocks Rq = (Xq, ■ ■ ■ , -X T (0)m+m-l)> Ri = (^m(r(i-l)+l) > • • • j ^mr{i)+m-l) are one " 
dependent, i.e. for all A; a(Ri,i < k) is independent of o~(Ri,i > k). In the special case, when 
m = 1 (the so called strongly aperiodic case) the blocks Ri are independent. Moreover, for i > 1 
the blocks i?j form a stationary sequence. 

In particular for any function /: X — > H, the corresponding additive functional X^=i fii^-i) 
can be split (modulo the initial and final segment) into a sum (of random length) of one- 
dependent (independent for m = 1) identically distributed summands 

mr(i+l)+m— 1 

*(/) = E ./' iA > 

j'=m(r(i)+l) 

A crucial and very useful fact is the following equality, which follows from Pitman's occupa- 
tion measure formula (|33t I34j. see also Theorem 10.0.1 in |26j). 

r(0) 

E„ J2 F ( x mi,Yi) = 5- 1 vr(C)- 1 E^F(X , y ), (2) 
i=0 

where by E^ we denote the expectation for the process with Xq distributed according to the 
measure \i. 
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It is also worth noting that the distribution of Si(/) is equal to the distribution of 



r(0)m+m- 
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S = S(f) 



E 



f(Xi) 



provided that Xq is distributed according to v. 
In particular, by ([2]) this easily implies that 



Esi(f) = d-^iC^m 




(3) 



JX 



The above technique of decomposing additive functionals of Markov chains into indepen- 
dent or almost independent summands has proven to be very useful in studying limit theorems 
for Markov chains (see e.g. [22 [23 E2J O O EH EZ]) as well as in obtaining non-asymptotic 
concentration inequalities (see e.g. [131 US E E])- The basic difficulty of this approach is pro- 
viding proper integrability for the variable S. This is usually achieved either via pointwise drift 
conditions (e.g. (2S1 El El G] ) j especially important in Markov Chain Monte Carlo algorithms 
or other statistical applications, when not much information regarding the behaviour of / with 
respect to the stationary measure is available. Such drift conditions are also useful for quan- 
tifying the ergodicity of the chain, measured in terms of integrability of the regeneration time 
T = t(1) — t(0) (which via coupling constructions can be translated in the language of total 
variation norms or mixing coefficients). 

Another line of research is more theoretic and concerns the behaviour of the stationary chain. 
It is then natural to impose conditions concerning integrability of / with respect to the measure 
7r and to assume some order of ergodicity of the chain. 

Classical assumptions about integrability of T are of the form ET a < oo or Eexp(#T) < oo, 
which corresponds to polynomial or geometric ergodicity of the chain. However recently new 
modified drift conditions have been introduced |15|. I14j. which give other orders of integrability 
of T corresponding to various subgeometric rates of ergodicity. Chains satisfying such drift 
conditions appear naturally in Markov Chain Monte Carlo algorithms or analysis of nonlinear 
autoregressive models [15] . 

From this point of view it is natural to ask questions concerning more general notions of 
integrability of the variable S. In this note we will focus on Orlicz integrability. Recall that 
ip: [0, oo) — > R+ is called a Young functions if it is strictly increasing, convex and <p(0) = 0. For 
a real random variable X we define the Orlicz norm corresponding to (p as 



The Orlicz space associated to ip is the set L v of random variables X such that ||-X"|L < oo. 

In what follows, we will deal with various underlying measures on the state space X or 
on the space of trajectories of the chain. To stress the dependence of the Orlicz norm on the 
initial distribution fi of the chain (X n ) we will denote it by || • e.g. [|iS'|| 7r)¥ , will denote the 
99-Orlicz norm of the functional S for the stationary chain, whereas \\S\\ U u, the (/^-Orlicz norm 
of the same functional for the chain started from initial distribution v. We will also denote by 
|| /|| u p the /3-Orlicz norm of the function /: X — > R when the underlying probability measure is 



X 




mi{C > 0: Etp(\X\/C) < 1}. 
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ji. Although the notation is the same for Orlicz norms of functionals of the Markov chains and 
functions on X, the meaning will always be clear from the context and thus should not lead to 
misunder st anding . 

Remarks 1. Note that the distribution of T is independent of the initial distribution of the 
chain and is equal to the distribution of r(0) + 1 for the chain starting from the measure v. Thus 

= lk(o) + i||^. 

2. In [31], the authors consider ergodicity of order i/j of a Markov chain, for a special 
class of nondecreasing functions ip : N — >■ M + . They call a Markov chain ergodic of order tp iff 
E u tp°(T) < oo, where ip°(n) = ]Cr=i^W- Since ip° can be extended to a convex increasing 
function, one can easily see that this notion is closely related to the finiteness of a proper Orlicz 
norm of T (related to properly shifted function 

We will be interested in the following two closely related questions 

Question 1 Given two Young functions (p and ijj and a Markov chain (X n ) such that ||T|L < 
oo, what do we have to assume about /: X — > R to guarantee that ||5'|| ;/j(jC , < oo (resp. < 
oo)? 

Question 2 Given two Young functions p and ip, a Markov chain (X n ) such that ||T|L < oo 
and /: X — ¥ M, such that ||/||7r,p < oo, what can we say about the integrability of 5 for the 
chain started from u or from 7r? 

As it turns out, the answers to both questions are surprisingly explicit and elementary. We 
present them in Section [2] (Theorems [21 [9[ Corollaries (H fl4|) . The upper estimates have very 
short proofs, which rely only on elementary properties of Orlicz functions and the formula ([3]). 
They are also optimal as can be seen from Propositions [3l [10] and Theorem 0] proven in Section 
[3] by constructing a general class of examples. 

We would like to stress that despite being elementary, both the estimates and the coun- 
terexamples have non-trivial applications (some of which we present in the last section) and 
therefore are of considerable interest. For example when specialized to (p(x) = x 2 , the estimates 
give optimal conditions for the CLT or LIL for Markov chains under assumptions concerning 
the rate of ergodicity and integrability of the test functions in the stationary case. 

In the following sections of the article we present the estimates, demonstrate their optimality 
and provide applications to limit theorems and tail estimates. For the reader's convenience we 
gather all the basic facts about Orlicz spaces which are used in the course of the proof in the 
appendix (we refer the reader to the monographs [211 [Ml EH] for more detailed account on this 
class of Banach spaces). 

2 Main estimates 

To simplify the notation in what follows we will write r instead of r(0). 
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2.1 The chain started from v 
Assumption (A) We will assume that 

lim ip(x)/x = and ip(l) > 1. 

Since any Young function on a probability space is equivalent to a function satisfying this condi- 
tion (see the definition of domination and equivalence of functions below) it will not decrease the 
generality of our estimates while allowing to describe them in a more concise manner. In partic- 
ular it assures the correctness of the following definition (where by a generalized Young function 
we mean a nondecreasing convex function p: [0, oo) — >■ [0, oo] with p(0) = 0, lirn^oo p(x) = oo). 

Definition 1. Let ip and ip be Young functions. Assume that ip satisfies the assumption (A). 
Define the generalized Young function p = p v ^ by the formula 

p(x) = sup . 

Theorem 2. Let ip and ip be Young functions. Assume that ip satisfies the assumption {A). 
Let p = ptptb- Then for any Harris ergodic Markov chain (X n ), a small set C and a measure v 
satisfying we have 



mr+m-l 

Proof. Let a = \\t + l\\ v ,ip, b = ||/||7i-,p- We have 

•vrm+m— 1 



V f(Xj) <2m||r + l||^||/|| ffjP . (4) 
j=0 



( S \ fZ^j=o A A iA 

^»<P[ = ®M — 7 

\aomJ \ abm J 



~ v ^ (r+l)m 

^— ' am ^ (t + l)m 

= d-^icy^-^pUix^b- 1 ) + e^((t + i)0, 

where the first inequality follows from Jensen's inequality, the second one from the definition of 
the function p and the last equality from ([3]). Let us now notice that another application of ([3]) 
gives 

E v (t + 1) = S-\(C)-\ 

Thanks to the assumption i/>(l) > 1, we have E u ip((r + l)<5vr(C)) > ^(E„(r + l)8ir(C)) = ip(l) > 
1, which implies that a > 6 tt(C) . Combined with the definition of a and b this gives 

\abmJ 

and hence E u ip(S / (2abm)) < E v 2~ l ip(S '/ 'abm) < 1, which ends the proof. □ 
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As one can see the proof is very simple. At the same time, it turns out that the estimate 
given in Theorem [2] is optimal (up to constants) and thus answers completely Question 1 for the 
chain starting from v. Below we present two results on optimality of Theorem [2] whose proofs 
are postponed to the next section. 

Domination and equivalence of functions Consider two functions pi,p 2 : [0, oo) — > [0,oo]. 
As is classical in the theory of Orlicz spaces with respect to probabilistic measures, we say that 
p 2 dominates p\ (denoted by p\ H p 2 ) if there exist positive constants C\, C 2 and xq, such that 

Pi(x) <C lP2 (C 2 x) (5) 

for x > xq. One can easily check that if pi are Young functions then p\ -< p 2 iff there is an 
inclusion and comparison of norms between the corresponding Orlicz spaces. We will say that 
pi and p 2 are equivalent {p\ ~ p 2 ) iff pi ^ p 2 and p 2 ■< p\. One can also easily check that two 
Young functions are equivalent iff they define equivalent Orlicz norms (and the same remains 
true for functions equivalent to Young functions) . Note also that if ([5]) holds and p 2 is a Young 
function then p\(x) < p 2 (m&x(Ci, l)C 2 x). 

Our first optimality result is 

Proposition 3 (Weak optimality of Theorem [2]). Let ip and ip be as in Theorem^ Assume 
that a Young function p has the property that for every Harris ergodic chain (X n ), a small set 
C, a small measure v with \\t\\ u ^ < 00 and every function f : X — > M such that \\f\\n,p < 00, we 
have \\S(f)\\ Ut9 < 00. Then p^ ^ p. 

It turns out that if we assume something more about the functions <p> and ip, the above 
proposition can be considerably strengthened. 

Theorem 4 (Strong optimality of Theorem [2]). Let (p,tjj and p be as in Theorem^ Assume 
additionally that tp^ 1 o ip is equivalent to a Young function. Let Y be a random variable such 
that || Y ||p = 00. Then there exists a Harris ergodic Markov chain (X n ) on some Polish space 
X , with stationary distribution ir, a small set C , a small measure v and a function f : X — > R, 
such that the distribution of f under 7T is equal to the law ofY \ ^ and \Si^f^j — 

Remarks 1. In the last section we will see that the above theorem for (p(x) = x 2 can be used 
to construct examples of chains violating the central limit theorem. 

2. We do not know if the additional assumption on convexity of ip~ l o ip is needed in the above 
Theorem. 

3. In fact in the construction we provide the set C is an atom for the chain (i.e. in the 
minorization condition m = 1 and 5 = 1). 

The above results give a fairly complete answer to Question 1 for a chain started from a 
small measure. We will now show that Theorem [2] can be also used to derive the answer to 
Question 2. 
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Recall that the Legendre transform of a function p: [0, oo) — > R+ is defined as p* = sup{xy — 
p(y) '■ V > 0}. Our answer to Question 2 is based on the following observation (which will also 
be used in the proof of Theorem 0]) . 

Proposition 5. For any Young functions *p,ip satisfying Assumption (A), the function p — Pf,'4> 
is equivalent to if , where n = (ip*) ° <p* ■ More precisely, for any x > 0, 

2r ] *{2- 1 x) < p{x) < 2- 1 r ] *(2x). (6) 

Before we prove the proposition let us derive the immediate corollary, whose optimality will 
also be shown in the next section. 

Corollary 6. Let p andip be two Young functions. Assume thatip satisfies the assumption (A). 
Then for any Harris ergodic Markov chain, small set C, small measure v and any f : X — > R 
we have 

||S||^<4m||(T + 1)11^11/11^, 

where (p = (ip* op*)*. 

Proof of Proposition Using the fact that <p** = ip we get 

<p(xy) - ip(y) xyz - tp*(z) -ip(y) 
p[x) = sup = sup sup 

j/>o y y>o z>o y 

= sup xz — mi = 77 (x), 

z >o V y>o y i 

where fj(x) = inf y >o(y*(-z) + ip(y))y~ 1 . Note that as a function of y, ip*(z)y~ 1 decreases whereas 
tp(y)y~ 1 increases, so for all z > we have 

£M < , (z) < 

yo yo 

where yo is defined by the equation <p*(z) = ip(yo), he. t/o = V' _1 (v ; '( z ))- in combination with 
Lemma [22 from the Appendix, this yields 

1 

-r](z) < fj(z) < 2rj(z), 

which easily implies that 2rf{x/2) < p{x) < 2~ 1 n*(2x) and thus ends the proof. □ 

We also have the following Proposition whose prove is deferred to Section [3l 

Proposition 7. Let ip and p be as in Corollary [6] and let ip be a Young function such that for 
every Markov chain (X n ), small set C, small measure u and f: X — > R with \\t\\ u ^ < oo and 
\\f\\w,p < oo we have \\S\\ VtV> < oo. Then ip ^ (ip* o p*)* . 
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Examples Let us now take a closer look at consequences of our theorems for classical Young 
functions. The following examples are straightforward and rely only on theorems presented 
in the last two sections and elementary formulas for Legendre transforms of classical Young 
functions. The formulas we present here will be used in Section 21 We also note that below 
we consider functions of the form x \— > exp(x a ) — 1 for a £ (0, 1). Formally such functions are 
not Young functions but it is easy to see that they can be modified for small values of x in 
such a way that they become Young functions. It is customary to define = inf{C > 

0: Eexp((| X\/C) a ) < 2}. Under such definition || • \\^ a is a quasi-norm, which can be shown to 
be equivalent to the Orlicz norm corresponding to the modified function. 

pQ-i) 

1. If tp(x) = x p and ip(x) = x r , where r > p > 1 then Pm^(x) ~ x r ~p . 



a/3 



2. If (p(x) = exp(x a ) — 1 and ip{x) = exp(x") — 1, where (3 > a then ~ exp(xp-°> ) — 1. 

3. If ip(x) = x p and ip(x) = exp(x' 3 ) — 1, where f3 > then p Vj ^(x) ~ x p log^ p_1 ^ /3 x. 

rp 

4. If ip(x) = x r and p(x) = x p then <p{x) ~ x r +f- 1 . 

a a P 

5. If ip(x) = exp(x^) — 1 and p{x) = exp(x a ) — 1 (a, (3 > 0), then tp(x) — ex.p(x a +' 3 ) — 1. 

x p 



6. If ip(x) = exp(x^) — 1 (j5 > 0) and p(x) = x p (p > 1), then ip(x 



2.2 The stationary case 

We will now present answers to questions 1 and 2 in the stationary case. Let us start with the 
following 

Definition 8. Let if and ip be Young functions. Assume that lim^^o ip(x)/x = and define the 
generalized Young function £ = ^ by the formula 

C(x) = sup(ip(xy) - y~ 1 tp(y)). 
y>o 

The function C will play in the stationary case a role analogous to the one of function p for 
the chain started from the small measure. 

Theorem 9. Let ip and ip be Young functions, lim^^o ^{x)/x = 0. Let £ = CpC- Then for any 
Harris ergodic Markov chain (X n ), small set C and small measure v we have 

j| ^ x i) <^lk + ilU(i + Mc)lk + ilkv»)ll/lkc- (7) 

II £ ' 7T,U3 V / 

Proof The proof is very similar to the proof of Theorem [21 however it involves one more use of 
Pitman's formula to pass from the stationary case to the case of the chain started from v. 
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Consider any functional F: x {0, 1} N — > IR (measurable wrt the product cx-field) on the 
space of the trajectories of the process (X n ,Y n ) n >o (recall from the introduction that we identify 
X n and X n ). By the definition of the split chain we have for any i S N, 

E(F((X i ) i > im ,(F i ) i > i )|^>^f ) = G(X im ,Yi), 

where G(x,y) = E( a . )y )F((X i ) i > , (Yi)i>o) = EF((Xj)j> , (Yi)i> \X = x,Y = y). In particular 
for the functional 

mr+m— 1 

F((X l ) l > ,(Y i ) i > ) = cp((abm)- 1 E /(X)), 

i=0 

where a = ||r + 1 ] | ^ , V ano - ^ = ll/lk.C) we have 

mr+m— 1 r 

E^((a6m)- 1 E /(X,)) = E W G(X , F ) = Sn(C)E u E G(X im , F;) 

j=0 i=0 

oo oo 

= <5vr(C) Y,®MXim,Yi)H<T} = Sn(C)Y,^(F((X j ) j > im , (*i)i>i)I^L-?f) Hi<r} 

i=0 i=0 

oo mr+m— 1 r mr+m— 1 

= <5vr(C7)^E^((a6m)- 1 ]T /(X,)) l {i < r} = fe(C)E, E ^((a&m)- 1 ]T f(Xj 

i=0 j=im i=0 j=im 

t mr+m— 1 



i=0 j=im 
mT+m-1 [j/m] 

wiT+m-l , . — 1 I , I 

<5vr(C)E, E III) ^((^^(r + l)/^)), 
i=o 1 + J 

where the second equality follows from ([5]) and the two last inequalities from the convexity of (p. 
We thus obtain 

mr+m— 1 

E^((a6m)- 1 5(/)) < S^rrT 1 ^ E ^(^"V + 1)/(X)) 

i=0 
mr+m— l 

< SitiOm- 1 ^ E C{b' 1 f(X l )) + dTr(C)aE^(a- 1 {T + l)) 

i=0 

< E^C(&~7PQ))) + 5it{C)aE v il>{a- l {T + 1)) < 1 + 5n{C)a, 

which ends the proof. □ 
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Remark The dependence of the estimates presented in the above theorem on ||r+l|| y w, cannot 
be improved in the case of general Orlicz functions, since for p(x) = x, ip( x ) = x 2 , and / = 1 we 
have ||<S'(/)|| 7r)¥ j = E„-(t + 1) ~ K u (t + l) 2 = \\r + 1|| 2 ^. However under additional assumptions 
on the growth of cp one can obtain a better estimate and replace the factor 1 + 5it(C)\\t + 
by g(l + 5tt(C)\\t + where g(r) = sup x>Q x / p~ l {p{x) / r) . For rapidly growing tp and 

large ||r + l\\ u ,tp this may be an important improvement. It is also elementary to check that for 
<j>(x) = exp(x a ) — 1, we can use g(r) ~ log 1//a (r). 

Just as in the case of Theorem [2j the estimates given in Theorem [9] are optimal. Below we 
state the corresponding optimality results, deferring their proofs to Section 

Proposition 10 (Weak optimality of Theorem EJ|). Let p and ip be as in Theorem® Assume 
that a Young function £ has the property that for every Harris ergodic chain (X n ), small set C 
and small measure v with \\t\\ u ^ < oo and every function f-.X—^-M. such that \\f\\n£ < we 
have \\S(f)\\ n ^ < oo. Then C^,v ^ C- 

Theorem 11 (Strong optimality of Theorem [U]). Let p,ip and £ be as in Theorem® Let 
ip(x) = ip(x)/x and assume additionally that the function n = p~ l o tp is equivalent to a Young 
function. Let Y be a random variable such that \\Y\\^ = oo. Then there exists a Harris ergodic 
Markov chain (X n ) on some Polish space X with stationary distribution tt, small set C , small 
measure v and a function f : X — > R, such that the distribution of f under tt is equal to the law 
ofY, \\t\\ u ^ < oo and \\S(f)\\ ntip = oo. 

Proposition 12. For any Young functions ip, ip such that linxj—^oo ip(x)/x = 0, the function 
C = C<pi/> i s equivalent to p o rf , where n(x) = p~ l (ip(x) /x). More precisely, for all x > 0, 

p(r ] *(x))<C(x)<^p( V *(2x)). 

Proof. Thanks to the assumption on ip, we have lim y ^Q p(xy) — ip{y)/y = 0, so we can restrict 
our attention to y > 0, such that p(xy) > ip(y)/y (note that if there are no such y, then 
T]*(x) = C( x ) = an d the inequalities of the proposition are trivially true). For such y, by 
convexity of p we obtain 

p(xy) - ip(y)/y > pixy - p~ 1 (ip{y)/y)) 

and | 

p(xy) - ip(y)/y < p{xy) - ip(y)/(2y) < -p(2xy - pT 1 (ip(y) / y)) , 

which, by taking the supremum over y, proves the proposition. □ 

Lemma 13. Assume that £ and ip are Young functions, ip(x) = tp(x)/x is strictly increasing 
and ip(0) = 0, tp(oo) = oo. Let the function k be defined by 

K-\x) = C\x)r\x) (8) 
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for all x > 0. Then there exist constants K,xq G (0,oo) such that for all x > xq, 

K~ l x < < 2x (9) 

where $ = k _1 o ip. 

Moreover the function ( = k o # is equivalent to £. 

Proof. Note first that = C~ 1 ( 1 P( X )) X ^ anci so $ is equivalent to a Young function (e.g. by 
Lemma 1201 in the Appendix). The inequalities ([9]) follow now by Lemma I2T1 from the Appendix. 
Moreover 

c\x) = (ry\ K -\x)) 

and thus by Q for x sufficiently large, 

K^CHx) = K-'C^\ < ~C\x) < 2fM = 2 C \x), 
yj L (x) ip L (x) 

which clearly implies that C,(K~ l x) < C,( x ) ^ C(% x ) f° r x large enough. □ 

If now p is a Young function such that ip -< k, then p^p' 1 o ip)*) ^ o^j)*) ~ £. Thus 

the above Lemma, together with Theorem and Proposition 1121 immediately gives the following 

Corollary 14. Assume that £ and -0 ar ^ Young functions, ip(x) = ip(x)/x is strictly increasing, 
ip(0) = 0, ip(oo) = oo. Let the function k be defined by If p is a Young function such that 
p ■< k, then there exists K < oo, such that for any Harris ergodic Markov chain (X n ) on X , 
small set C , small measure v and f : X — > ~R, 

I W)[k„ < K\\t + l||^(l + 8tt(C)\\t + 1||^) ||/|U, C . (10) 

Remark For slowly growing functions ip and ( there may be no Orlicz function ip such that 
p -< k. This is not surprising since as we will see from the construction presented in Section 
13.11 the 7r-integrability of S(f) is closely related to integrability of functions from a point-wise 
product of Orlicz spaces. In consequence S(f) may not even be integrable. 

We have the following optimality result corresponding to Corollary [T41 Its proof will be 
presented in the next section. 

Proposition 15. Assume that £ and ip are Young functions, ip(x) = ip(x)/x is strictly increas- 
ing, ip(0) = 0, ip (oo) = oo. Let the function k be defined by fflfl) and let p be a Young function 
such that for every ergodic Markov chain (X n ), small set C, small measure v and f: X — > R 
with \\t\\ v ^ < oo and H/H^ < oo, we have \\S(f)\\ W)tp < oo. Then p <k. 

Remark By convexity of p, the condition p ^ k holds iff there exists a constant K < oo and 
xo > 0, such that 

Kp- 1 (x) > ip~ 1 (x)C 1 (x) 

for x > xq. Thus under the assumptions that ip is strictly increasing ip(0) = 0, ip(oo) = oo, 
the above condition characterizes the triples of Young functions such that ||/||irC < 00 implies 
\\S(f)\\ W)(p < oo for all Markov chains with ||r||^ < oo. 
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Examples Just as in the previous section we will now present some concrete formulas for 
classical Young functions, some of which will be used in Section [3] to derive tail inequalities for 
additive functionals of stationary Markov chains. 

pQ — 1) 

1. If ip(x) = x p and ip(x) = x r , where r > p + 1 > 2 then C, ?£>(x) — x r ~p~ 1 . 

a/3 

2. If ip(x) = exp(x°) — 1 and ip(x) = exp(x^) — 1, where ft > a then (^^(x) ~ exp(x' 9 - Q ) — 1. 

3. If (p(x) = x p and tp(x) = exp(x^) — 1, where /3 > then Cip,ip( x ) — x p log p ^ x. 

(r-l)p 

4. If ip(x) = x r and £(x) = x p (r > 2,p > (r — l)/(r — 2)) then ip(x) ~ x r +p-i. 

5. If ?/>(x) = exp(ar) — 1 and C(x) = exp(x a ) — 1 (a, ft > 0), then ip(x) ~ exp(x a +' 3 ) — 1. 

6. If if)(x) = exp(x /3 ) — 1 (ft > 0) and £(x) = x p (p > 1), then <p(x) ~ f£g . 

log ' X 

3 Proofs of optimality 

3.1 Main counterexample 

We will now introduce a general construction of a Markov chain which will serve as an example 
in proofs of all our optimality theorems. 

Let S be a Polish space and let a be a Borel probability measure on S. Consider two 
functions /: S — > R and h: S — >• N\ {0}. We will construct a Markov chain on some Polish 
space X D 5, a small set C C Af, a probability measure f and a function /: — > R, possessing 
the following properties. 

Properties of the chain 

(i) The condition ([1]) is satisfied with m = 1 and (5 = 1 (in other words C is an atom for the 
chain) , 

(ii) v(S) = 1, 

(iii) for any x G 5, P x (r + 1 = /i(x)) = 1, 

(iv) for any x G 5, P x (5(/) = /(x)/i(x)) = 1, 

(v) For any function G : R — )• R we have 

E„G(5(/)) = G(/(x)/i(x))/ l (x)~ 1 a(dx) 

and 

E„G(r + 1) = i? J G(h(x))h(x)- 1 a(dx), 
where R = (J c h(y)^ 1 a(dy))^ 1 , 
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(vi) (X n ) admits a unique stationary distribution tt and the law of / under tt is the same as 
the law of / under a, 

(vii) for any nondecr easing function F : X — )■ R, 

E^(W)|) >\! F(h(x)\f(x)\/2)a(dx). 
J s 

(viii) if a({x: h(x) = 1}) > then the chain is Harris ergodic. 

Construction of the chain Let X = IJ^LiI^ £ S: h(x) > n} x {n}. As a disjoint union it 
clearly possesses a natural structure of a Polish space inherited from S. Formally S % X but it 
does not pose a problem as we can clearly identify S with S x {1} = {x G S : h(x) > 1} x {1}. 
The dynamics of the chain will be very simple. 

• If X n = (x, i) and h{x) > i, then with probability one X n+ \ = (x,i + l). 

• If X n = (x,i) and h(x) = i, then X n+ \ = (y, 1), where y is distributed according to the 
probability measure 



u{dy) = Rhiy^aidy). (11) 
More formally, the transition function of the chain is given by 

P((r * A) _ / <Wi)(4) if * < h (x) 

U ' A ' \ v({yeS: (y,l) E A}) if i = h(x). 

In other words the chain describes a particle, which after departing from a point (x, 1) G S 
changes its 'level' by jumping deterministically to points (x, 2), . . . , (x, /i(x)) and then goes back 
to 'level' one by selecting the first coordinate according to the measure v. 

Clearly v(S) = 1 and so condition (ii) is satisfied. Note that a and v are formally measures 
on 5, but we may and will sometimes treat them as measures on X. 

Let now C = {(x,i) E X : h(x) = i}. Then P((x,i), A) = v{A) for any (x,i) G C and a 
Borel subset A of X, which shows that holds with m = 1 and 6 = 1. 

Let us now prove condition (iii). Since C is an atom for the chain, Y n = 1 iff X n G C. 
Moreover if Xq = (x, 1) ~ x G 5, then Xj = (x, ? + 1) for i + 1 < h(x) and r = inf{i > 0: Xj G 
C} = inf{i > 0: i + 1 = /i(x)} = h(x) — 1, which proves property (iii). 

To assure that property (iv) holds it is enough to define 

f((x,i)) = f(x), 

since then /(Xq) = (x, 1) implies that f(X n ) = f(x) for n < r. 

Condition (v) follows now from properties (ii), (iii) and (iv) together with formula (|11|) . 
We will now pass to conditions (vi) and (vii). 
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By the construction of the chain it is easy to prove that the chain admits a unique stationary 
measure it given by 

tt(A x {k}) = a^n- 1 
for A C {x G S : /i(x) = n} and any k < n . Thus for any Borel set BCMwe have 

7r({(x,i) G f((x,i)) G B}) = 7r({(x,i) G X: f(x) G 5}) 

n>l 



n • n 1 a({x G 5: = n,f(x) G -B}) 



n>l 



= a{{x G 5: />) G £}). 

As for (vii), Xq = (x,i) implies that 

S(f) = (h(x)-i + l)f(x). 
Thus, letting A n ^ = {(x,k) G X : h(x) = n}, B n = {x G S : h(x) = n}, we get 

E r F(\S(f)\)= [ F((h(x)-i + l)\f(x)\)7r(d(x,i))=J2 [ F((n-k + l)\f(x)Md(x,i)) 



X n,k 



E E / n ~ lp ({n - k + l)\f(x)\)a(dx) > I \F{n\f{x)\/2)a{dx) 



n k<n 

= \\ F(h(x)\f(x)\/2)a(dx), 

proving (vii). 

Now we will prove (viii). Note that A := {x G S: h(x) = 1} C C. Thus if a(A) > then 
also v(C) > 0, which proves that the chain is strongly aperiodic (see e.g. chapter 5 of |26| or 
Chapter 2 of [29]). Moreover one can easily see that ir is an irreducibility measure for the chain 
and the chain is Harris recurrent. Thus by Proposition 6.3. of }29] the chain is Harris ergodic (in 
fact in |29] ergodicity is defined as aperiodicity together with positiveness and Harris recurrence 
and Proposition 6.3. states that this is equivalent to convergence of n-step probabilities for any 
initial point). 

What remains to be proven is condition (i). Since tt(C) > we have C G £ + , whereas 
inequality (pQ) for m = 5 = 1 is satisfied by the construction. 



3.2 The chain started from v 

We will start with the proof of Proposition [3J The chain constructed above will allow us to 
reduce it to elementary techniques from the theory of Orlicz spaces. 

Proof of Proposition [3J Assume that the function p does not satisfy the condition p^^ -< p. 
Thus there exists a sequence of numbers x n — > oo such that 

p(x n ) < p lf ^(x n 2~ n ). 
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By the definition of p v ^ this means that there exists a sequence t n > such that 

ip{x n t n 2- n ) ip{t n ) 
_ p{x n ) + 



One can assume that i n > 2. Indeed, for all n large enough if t n < 2, then 

w((j; n 2- 1 )2-( n - 1 ) • 2) w(x n t n 2-™) , . , . 1n d>(2) 

y^Ujx ^ , > , > ^ > 2/>(x n 2- 1 ) > p{x n 2~ x ) + 

Z t n Z 

Set r n = \t n \ for n > 1 and To = 1. We have for n > 1 

> pW-*) > ^ } + tM> p{Xn) + feO > i, (12) 

where in the last inequality we used assumption (A). Define nowp n = C2~ n {^}{j v )/r n +p{x n )) , 
where C is a constant such that ^2 n>0 p n = 1- Consider a Polish space 5 with a probability 
measure a, a partition 5 = Un>o = and two functions /t and /, such that f(x) = x n 

and h(x) = r n for x £ ^4 n . 

Let (X n ) n >o be the Markov chain obtained by applying to 5, / and h the main construction 
introduced in Section 13. 11 By property (viii) and the condition tq = 1, the chain is Harris 
ergodic. By property (v) we have 

E^(r + l) = R [ %l>{h{x))h(x)- l a{dx) = i?V ^^- Pn < 2RC 

by the definition of p n . Thus the chain (X n ) satisfies 1 1 t 1 1 < oo. 
By property (vi) we get 



E^(/) = / p(f(x))a(dx) =Y,P( x n)Pn < 2C. 

n>0 



On the other hand for any 8 > we have by property (v), the construction of functions f,g 
and (JED, 

E u <p(0\S(f)\) = R j^(9\f(x)\h(x))h(x)- 1 a(dx) 



n>l n 2™- 1 6»>l " 



2 n - 1 e>i 

which shows that HS^/)!!^ = 00 an d proves the proposition. □ 
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Proof of Theorem^ Let S be a Polish space, a a probability measure on S and /: S — > R a 
function whose law under a is the same as the law of Y. 

We will consider in detail only the case when lim^^oo <p(x)/x = oo. It is easy to see using 
formula ([3]) and the construction below that in the case <p ~ id the theorem also holds (note 
that in this case also p ~ id) . 

By the convexity assumption and Lemma[22]from the Appendix, we obtain that 77 = (if)*) a 
ip* is equivalent to a Young function. Thus by Proposition [5] and Lemma [2T1 from the Appendix 
we get 

By Lemma ITU1 in the Appendix (or in the case when p* ~ id by the well known facts about the 
spaces L\ and Loo)) there exists a function g: S — >• such that 

r (<y( ' r)) -a(cfe) < 00 and f \f(x)\g(x)a(dx) = 00. (14) 



^-%*(<7(:c))) 

Define the function /i: 5 -> N \ {0} by h(x) = [^~ l {ip* {g(x)))\ + 1. 

Let now X, (X n ) and / be the Polish space, Markov chain and function obtained from 
S, a, f, h according to the main construction of Section 13.11 Note that we can assume that 
a({x: h(x) = 1}) > and thus by property (viii) this chain is Harris ergodic. 

Note that by the definition of h, if h(x) > 2 then h(x) < 2tp~ 1 (ip* (g(x))) . Thus by property 
(v) and (fT4"j) we get 



E v i/}((t + 1)/2) = R I ip(h(x)/2)h(x)- 1 a(dx) 

<P*{9&)) 



a(dx) < 00, 



which implies that 1 1 t 1 1 < 00. Recall now the definition of v given in (|11|) . For all a > we 
have 

EMS(f)/a) = J^(f(x)h(x)/a)v(dx), 

which implies that ||S , (/)||^ (/P < 00 iff ||//t||y )V , < 00 (note that on the left hand side v is treated 
as a measure on X and on the right hand side as a measure on S). 
Note however that by (fT4"j) we have 

f ip*{g{x))v{dx) =rI ^^- a(dx) <R [ , f.^Jff a (dx) < 00, 
.As Js Kx) v ; " Js ^- 1 {i P *{g(x))) v > 

which gives HffHj/,^* < 00, but 

/ f(x)h(x)g(x)iy(dx) = R / f(x)g(x)a(dx) = 00. 
Js Js 

This shows that \\fh\\ U}( p — 00 and ends the proof. D 
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Proof of Proposition [7| Note that for any function / (not necessarily equivalent to a Young 
function) we have /** < /. Thus by Proposition [5] we have 

/w <p ^ ((rr'of*)* <p ^ p*< (rr l °<p* r°p* ^ ^< w>*°pt, 

which ends the proof by Proposition [3j □ 
3.3 The stationary case 

For the proofs of results concerning optimality of our estimates for the chain started from ir, 
we will also use the general construction from Section 13.11 As already mentioned, in this case 
the problem turns out to be closely related to the classical theory of point-wise multiplication 
of Orlicz spaces (we refer to [32J for an overview) 

Proof of Proposition 1 1 01 Assume that the function £ does not satisfy the condition X £. 
Thus there exists a sequence of numbers x n — > oo, such that ((x n ) < C, v ^{x n 2~ n ), i.e. for some 
sequence t n > 0, n = 1, 2, . . ., we have 

ip(x n 2- n t n ) > Q{x n ) + i){t n )/t n . 

Similarly as in the proof of Proposition [31 we show that without loss of generality we can assume 
that t n are positive integers and thus the right hand side above is bounded from below. Let us 
additionally define to = 1. 

Let p n = C2~ n (((x n ) + ip(t n ) /t n )~ l , where C is such that ^2 n>0 Pn = 1 and consider a 
probability space (5,<r), where S = (J n A n with A n disjoint and a{A n ) = p n together with two 
functions /: S — > IR and h: S — > M such that for x £ A n , we have f(x) = x n , h{x) = t n . 

By applying to S,f and h the general construction of Section 13. 1| we get a Harris ergodic 
Markov chain and a function /, which by properties (v) and (vi) satisfy 

E^(r + 1) = R ^j^Pn < 2RC, 

E*C(/) = C(*n)Pn < C. 
n>0 

But by property (vii) we get for any 9 > 0, 

E«<p(0\S(f)\ > \ f tp(0h(x)\f{x)\/2)a(dx) = - £ V {9x n t n /2)p n 



" S n>0 

>|^(^(2"- 1 fe n i n 2-"K>| J] 2 n - l 6cp(x n t n 2- n )p n 
n>l 2"- 1 6»>l 

>^ 2™- 1 0(C(x n )+V(tn)AnK = OO, 

2"- 1 0>l 

which ends the proof. □ 
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Proof of Theorem\ll\ Consider first the case lim^^oo rj(x)/x = oo. 
We will show that for some constant C and x large enough we have 

V~ l {x) < CCHx^ix). (15) 

We have t/ _1 (x) = ip~ 1 (ip(x)) and thus by the assumption on rj and Lemma [2T1 from the Ap- 
pendix, we get tp (x) < C{rj*)^ 1 {^ 1 (x))ip~ l {x) for some constant C < oo and x large enough. 
But by Proposition [121 (r/*) _1 ((f" 1 (x)) < 2(~ 1 (x) and thus (H5j) follows. 

If lim x _ Sl00 r]{x)/x < oo then (|15p also holds if we interpret £ _1 as the generalized inverse 
(note that in this case = L^) 

Theorem 1 from [25J states that if <p, £, ip are Young functions such that (|15|) holds for all 
x £ [0, oo) and Y is a random variable such that || Y||( = oo, then there exists a random variable 
X, such that ||-X"||^ < oo and HXY]^ = oo. One can easily see that the functions <^>, -0 can be 
modified (for small values of x) to Young functions such that f)15f) holds for all x > 0. Thus there 
exists X satisfying the above condition. Clearly one can assume that with probability one X is 
a positive integer and P(X = 1) > 0. Consider now a Polish space (S, a) and /, h: S — > R such 
that (f,h) is distributed like (Y, X). Let (X n ) be the Markov chain given by the construction 
of Section 13.11 By property (v) we have 

E^(— ) = R [ t(;( t ^-)h(x)~ 1 a(dx) = -E^(-) < oo 
V a / J s \ a J a V a J 

for a large enough, since ||X|| r < oo. By property (vi), the law of / under ir is equal to the law 
of Y. Finally, by property (vii), for every a > 0, 



^(T)^M^ 



oo, 



which proves that ^(Z)!^^ = oo. □ 
Proof of Proposition [731 Let r] = ip' 1 o ip. By Propositions 1101 1121 and Lemma [13] we have 

where "d = k _1 oip. Thus ( r d*)~ 1 on~ 1 < (r/*) _1 o(^ _1 . Another application of Lemma [TBI together 
with Lemma[2T]in the Appendix yield for some constant C £ (1, oo) and x large enough, 

k'^x) < C(^)- 1 ( K - 1 (x))v5- 1 (x) < C^rfy^p-^Cx^-^Cx) 

= C 2 {7f)- 1 {ip- 1 {Cx))7 1 - 1 {ip- 1 {Cx)) < 2C 2 ^\Cx) 

which implies that ip -< K. □ 

4 Applications 

4.1 Limit theorems for additive functionals 

It is well known that for a Harris ergodic Markov chain and a function /, the CLT 

/(X ) + ... + /(*n-l) (16) 



n 
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holds in the stationary case iff it holds for any initial distribution. 

Moreover (see |12j and [6]) under the assumption that K n f 2 < oo, the above CLT holds iff 
K u S(f) = 0, K u (S(f)) 2 < oo and the asymptotic variance is given by a 2 = 5-K(C)m~ 1 (Ksi(f) 2 + 
2Esi(/)s2(/))- If the chain has an atom, this equivalence holds without the assumption E„-/ 2 < 
oo. 

It is also known (see [12]) that the condition ¥, n f = 0, E^d/j) 2 < oo implies the law of the 
iterated logarithm 

-a f = lim inf f( ~ Xi) < lim sup ^J=° /( ^ = a f a.s (17) 

n->oc -y/nloglogn n->oo \/n\og\ogn 

Moreover for chains with an atom limsup.^^^ < oo a.s. implies the CLT (see |12j . 



\Jn log log n 

Theorem 2.2. and Remark 2.3). 

Our results from section I2TT1 can be thus applied to give optimal conditions for CLT and LIL 
in terms of ergodicity of the chain (expressed by Orlicz integrability of the regeneration time) 
and integrability of / wrt the stationary measure. 

The following Theorem is an immediate consequence of Theorems [2j 0] and Proposition [3] 

Theorem 16. Consider a Harris ergodic Markov chain (X n ) on a Polish space X and a function 
f : X — )■ R, E,,-/ = 0. Let if) be a Young function such that lim^^o ip(x)/x = and assume that 
\\t\\v,iP < oo- Let finally p(x) = tp*(x 2 ), where tp(x) = rp(x)/x. If H/H^p < oo then the CLT i T7g|) 



and LIL p7\ ) hold. 

Moreover every Young function p such that H/H^p implies CLT (or LIL) for all Harris 
ergodic Markov chains with ||r||^ < oo satisfies p ■< p. 

If the function x i— )■ w ip(x) is equivalent to a Young function then for every random variable 
Y with \\Y\\ p = oo one can construct a stationary Harris ergodic Markov chain (X n ) and a 
function f such that f(X n ) has the same law as Y , \\t\\ v ^ < oo and both t!6\) and |i7] ) fail. 

Remark As noted in [2D] in the case of geometric ergodicity, i.e. when ip{x) = exp(x) — 1, 
the CLT part of the above theorem can be obtained from results in [16] . giving very general and 
optimal conditions for CLT under a-mixing. The integrability condition for / is in this case 
^■n-f 2 log + (|/|) < oo. The sufficiency for the LIL part can be similarly deduced from |36j. 

The equivalence of the exponential decay of mixing coefficients with ergodicity of Markov 
chains (measured in terms of ip) follows from [30} 131] . Optimality of the condition follows from 
examples given in [lOj. Examples of geometrically ergodic Markov chains and a function / 
such that E/ 2 < oo and the CLT fails have been also constructed in |18[ 111] , Let us point 
out that if the Markov chain is reversible and geometrically ergodic, then 11/11^,2 < oo implies 
the CLT and thus also E^5(/) 2 < oo. Thus under this additional assumptions our formulas 
for tp(x) = exp(x) — 1 and p(x) = x 2 are no longer optimal (our example from Section 13.11 
is obviously non-reversible). It would be of interest to derive counterparts of theorems from 
Section [2] under the assumption of reversibility. 

It is possible that in a more general case Theorem [16] can also be recovered from the above 
results, by proper characterizations of ergodicity in terms of mixing and characterizations of 
Orlicz spaces in terms of some weighted inequalities involving the tail of the function. However 
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we have not attempted to do this in full generality (we have only verified that such an approach 
works in the case of ip{x) = x p ). 

Let us also remark that to our best knowledge, so far there has been no 'regeneration' proof 
of Theorem [16] even in the case of geometric ergodicity. 

Berry-Esseen type theorems Similarly we can use a result by Bolthausen [HI [9] to derive 
Berry-Esseen type bounds for additive functionals of stationary chains. More specifically Lemma 
2 in [9], together with Theorem [2] give 

Theorem 17. Let (X n ) be a stationary strongly aperiodic Harris ergodic Markov chain on X , 
such that \\t\\ u ^ < oo, where ip is a Young function satisfying (i 4 i 3 ) X ip and lhn r _ 5 .o ip(x)/x = 
0. Let p = ^/*(x 3 ), where ^(x) = ^(yfx)/ '\fx. Then for every f: X — >• R such that H/Htto < oo 
and aj := E(S(f)) 2 > we have 



where = (2-ir)- 1 / 2 /f exp(-y 2 /2)dy. 



\ o~t\/n ) 



4.2 Tail estimates 

The last application we develop concerns tail inequalities for additive functionals. The approach 
we take is by now fairly standard (see e.g. [131 US El E El E2J, [23]) and relies on splitting the 
additive functional into a sum of independent (or one-dependent blocks) and using inequalities 
for sums of independent random variables. Our results on Orlicz integrability imply inequalities 
for the chain started from the small measure (an atom) or from the stationary distribution. 
The former case may have potential applications in MCMC algorithms in situations when small 
measure is known explicitly and one is able to sample from it. 
In what follows we denote ip a = exp(x a ) — 1. 

Theorem 18. Let (X n ) n >o be a Harris ergodic Markov chain on X . Assume that ||T||„w, a < oo 
for some a E (0, 1). Let f : X — > R be a measurable function, such that K n f = 0. If H/H-n-,^ < oo 
for some j3 > 0, then for all t > 0, 



,(|/(* ) + ... + /(*n_i)|>t) 



-^ 6XP ( Kn8ir(C)E u S(f) 2 ) +KeW \ K\\f\L f Jr + if 



t 1 

+ K exp ' 



^(ll/lk#||r + l||^jTlogn 
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and 



P 7r (|/(X ) + ... + /(X n _ 1 )| >t) 



t 2 \ „ / t 



< K exp ( — — ^^/i^^9 ) + ^ eX P 



Kn5-K{C)E v S{f)V *\ ^ll/lk^llr + lll^ 

^(ll/IU^IIr + lll^jTlogdlT + lll^jJ + eXP VK(||/||^||r + l||^j7logn 



where 7 = and K depends only on a, (3 and m in the formula HJj. 



Remarks 

1. The proof of the above theorem is similar to those presented in [TJ [2], therefore we will 
present only a sketch. 

2. When m = 1, 5Tv(C)K u S(f) 2 is the variance of the limiting Gaussian distribution for the 
additive functional. 

3. If one does not insist on having the limiting variance in the case m = 1 as the subgaussian 
coefficient and instead replaces it by K„S(f) 2 , one can get rid of the second summand on 
the right hand sides of the estimates (i.e. the summand containing ||r + 1|| 3 ). 

4. One can also obtain similar results for suprema of empirical processes of a Markov chain 
(or equivalently for additive functionals with values in a Banach space). The difference is 
that one obtains then bounds on deviation above expectation and not from zero. A proof is 
almost the same, it simply requires a suitable generalization of an inequality for real valued 
summands, relying on the celebrated Talagrand's inequality and an additional argument 
to take care of the expectation. Since our goal is rather to illustrate the consequences of 
results from Section [2j than to provide the most general inequalities, we do not state the 
details and refer the reader to [H [2] for the special case of geometrically ergodic Markov 
chains. For the same reason we will not try to evaluate constants in the inequalities. 

5. In a similar way one can obtain tail estimates in the polynomial case (i.e. when the 
regeneration time and/or the function / are only polynomially integrable). One just 
needs to use the other examples that have been discussed in Section [2j The estimate 
of the bounded part (after truncation) comes again from Bernstein's inequality, whereas 
the unbounded part can be handled with the Hoffman-Joergensen inequality (or its easy 
modifications for functions of the form x 1— > x p / (logi 3 x)) , just as e.g. in |17] , 

Proof of Theorem Below we will several times use known bounds for sums of independent 
random variables in the one-dependent case. Clearly it may be done at the cost of worsening the 
constants by splitting the sum into sums of odd and even terms, therefore we will just write the 
final result without further comments. In the proof we will use the letter K to denote constants 
depending on a, f3. Their values may change from one occurrence to another. 
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Setting N = inf{z: mr(i) + m — 1 > n — l}we may write 

(mr(0)+m-l) N 

\f(x Q )+...+f(x nr . 1 )\= £ + 1 j>c/)i + e i/(^)i 

j=0 i=l i=n 

=: I + 11 + III, 

where each of the sums on the right hand side may be interpreted as empty. 

The first and last terms can be taken care of by Chebyshev's inequalities corresponding to 
proper Orlicz norms, using estimates of Corollaries [6] and [TJ] (note that ¥(III > t) < P(I > 
t) + nF(\ Sl (f)\>t)). 

We will consider only the case of the chain started from v. The stationary case is similar, 
simply to bound I we use the estimates of Orlicz norms for the chain started from tt given in 
Theorem [9] (together with the remark following it to get better dependence on \\t + 1 1|^, 

By Corollary [6] and examples provided in Section [21 ||sj(/)||<p = H^C/OHi/,^- < K\\t + 
llkvJI/lk^- Thus 

P(J >t) + F(III > t) < 2nexp ((- — / V). (18) 

VV K\\t + lll^J/IU,^/ y 

The second term can be split into iTi + J/2 , where 

N 

Ih = I X^C/O^M/)^} " E ^(/)l { | Si(/) |< a} )|, 
i=l 
N 

Ih = I 5^(*t(/)l{| ai (/)|>a} " Esi(/)l{| 8i (/)|> })|. 

i=l 

Setting a = maxj< n ||si(/)||^ log 1/7 n < i^^mll/ll^ ||r + l\\^ a log 1/7 n, we can pro- 
ceed as in [2] to get 



P(" 2 >t)<2exp^-— j ). (19) 

It remains to bound the term I\. Introduce the variables Tj = t(i) — r{i — 1), i > 1 and note 
that ETj = ^-^(C)- 1 . For 4nm- 1 vr(C)5 > 2, we have 

[4nm- 1 TT(C)S\ 

F(N > 4nm _1 7r(C)«J) < p( ^ T- < n/m 

i=i 

|4nm- 1 7r(C*)(5J 

^ (Tj - ETj) < n/m - 2nm- 1 7r(C)5ET i 
i=i 

|4nm- 1 7r(C*)<5J 

(Ti - ETj) < -n/m 

1=1 
< k(n/m), 



22 



where k(t) = K a exp(— K a 1 mm(t 2 m/n\\T + 1|| 2 (t/\\r + 111^^)°)). The bound follows for 
a = 1 from Bernstein's ipi inequality and for a < 1 from results in |19| (as shown in [5]). 
Note that if 4nm- 1 7r(C)J < 2, then 

k(n/m) > exp(— K a nm~ l \\T + > exp(— K a nm _1 (Er + 1)~ 2 ) 

= exp(— K a nmT l 5tt(C)) > exp(—K a /2). 

Thus the above tail estimate for iV remains true (after adjusting the constant K a ). 

Thus by Bernstein's bounds on suprema of partial sums of a sequence of independent random 
variables we get 

Wh >t)< HHi >t&N< 4nm _1 7r(C)5) + k(n/m) 

( 1 / t 2 t \\ 
<Kexp [-— min — — . _ — ))+k(n/m), (20) 

On the other hand by N < n/m, the same inequalities we used to derive the function k and 
Levy type inequalities (like in |JJ), we obtain 

W(H >t)< h(t), (21) 

where h(t) = exp(-K~ 1 min(t 2 m/n(||/||^||r + l||„,tf a ) 2 , (f/ll/lh^J T + ^k^J 7 )) 
Now if k{n/m) > K exp(-(t/K\\f\\ T ^\\T + l\\^J a ), then 

K exp ( — ( ij ij — ~\ ~\ < k{n/m) < K exp ^ 



K\\f\\^ p \\r + l\UJ ) ~ W '- V \ \Km\\r + l\\l^ a 



and so 



t 2 m t 

> 



"(ll/IU,^ll r + MlvrfJ 2 \\ T + MH^jj \M P 

which ends the proof by ([2T]) . □ 



Appendix. Some generalities on Orlicz Young functions and Or- 
licz spaces 

All the lemmas presented below are standard facts from the theory of Orlicz spaces, we present 
them here for the reader's convenience. 

Lemma 19. If cp is a Young function then X G L v if and only ifE\XY\ < oo for all Y such 
that E(p*(Y) < 1. Moreover the norm \\X\\ = sup{EXY : Eip*(Y) < 1} is equivalent to \\X\\ V . 

The next lemma is a modification of Lemma 5.4. in [27]. In the original formulation it 
concerns the notion of equivalence of functions (and not asymptotic equivalence relevant in our 
probabilistic setting). One can however easily see that the proof from [27] yields the version 
stated below. 
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Lemma 20. Consider two increasing continuous functions F,G: [0, oo) — > [0,oo) with F(0) = 
G(0) = 0, F(oo) = G(oo) = oo. The following conditions are equivalent 

(i) F o G^ 1 is equivalent to a Young function. 

(ii) There exist positive constants C,xq such that 

F o G-^sx) >C- l sFo G~ 1 (x) 

for all s > 1 and x > xq. 
(Hi) There exist positive constants C,xo such that 

F(x) ~ G(x) 

for all s > 1, x > xq. 

Lemma 21. For any Young function ip such that linx^oo tp(x)/x = oo and any x > 0, 

x < (ip*)~ 1 (x)ip- 1 (x) < 2x. 

Moreover the right hand side inequality holds for any strictly increasing function r ip: [0,oo) — > 
[0, oo) with ip(0) = 0, ip(oo) = oo, lim^oo ip(x)/x = oo. 

Lemma 22. Let ip and ip be two Young functions. Assume that linx^oo ip{x)/x = oo. If ip^ 1 otp 
is equivalent to a Young function, then so is (ip*)^ 1 o tp* . 

Proof. It is easy to see that under the assumptions of the lemma we also have lirn^oo ip{x) fx = 
oo and thus ip*(x), tp*(x) are finite for all x > 0. Applying Lemma 1201 with F = <p , G = ip -1 , 
we get that 

f^jsx) > c _ x y~ 1 (z) 
ip~ l {sx) ~ ip~ 1 (x) 

for some C > 0, all s > 1 and x enough. By Lemma I2T1 we obtain for x large enough, 

(ip^isx) - [ ' (<p*)- 1 (x) , 
which by another application of Lemma [20] ends the proof. 

□ 
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